Fleshing it out: A Supervised Approach to MWE-token and MWE-type Classification
نویسندگان
چکیده
Although some multiword expressions (MWEs) like How do you do? have exclusively idiomatic meaning, other MWEtypes like the phrase kick the bucket may be idiomatic or literal depending on context. The recently developed OpenMWE corpus provides the largest freely available collection of annotated MWE-tokens suitable for supervised classification, but so far its potential has only been superficially investigated and only for classification of MWE-types in the corpus. Instead, we train and evaluate classifiers for crosstype classification and introduce novel features specialised to this task. Our best crosstype classifiers performed as well on non-trained MWE-types as a majority class baseline which has knowledge of the MWE-type.
منابع مشابه
Verb Noun Construction MWE Token Classification
We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. We present a supervised learning approach to the problem. We experiment with different features. Our approach yields the best results to date on MWE c...
متن کاملA Word Embedding Approach to Identifying Verb-Noun Idiomatic Combinations
Verb–noun idiomatic combinations (VNICs) are idioms consisting of a verb with a noun in its direct object position. Usages of these expressions can be ambiguous between an idiomatic usage and a literal combination. In this paper we propose supervised and unsupervised approaches, based on word embeddings, to identifying token instances of VNICs. Our proposed supervised and unsupervised approache...
متن کاملCombining resources for MWE-token classification
We study the task of automatically disambiguating word combinations such as jump the gun which are ambiguous between a literal and MWE interpretation, focusing on the utility of type-level features from an MWE lexicon for the disambiguation task. To this end we combine gold-standard idiomaticity of tokens in the OpenMWE corpus with MWE-type-level information drawn from the recently-published JD...
متن کاملjMWE: A Java Toolkit for Detecting Multi-Word Expressions
jMWE is a Java library for implementing and testing algorithms that detect Multi-Word Expression (MWE) tokens in text. It provides (1) a detector API, including implementations of several detectors, (2) facilities for constructing indices of MWE types that may be used by the detectors, and (3) a testing framework for measuring the performance of a MWE detector. The software is available for fre...
متن کاملConstruction of an English Dependency Corpus incorporating Compound Function Words
The recognition of multiword expressions (MWEs) in a sentence is important for such linguistic analyses as syntactic and semantic parsing, because it is known that combining an MWE into a single token improves accuracy for various NLP tasks, such as dependency parsing and constituency parsing. However, MWEs are not annotated in Penn Treebank. Furthermore, when converting word-based dependency t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011